LS4003 R worksheet 2
Sleep quality
Introduction to the data
For this worksheet, you will need the Sleep_health_and_lifestyle_dataset.csv file from the Canvas page.
This dataset contains values for sleep quality and lifestyle factors. This is artificial data generated for illustrative purposes
This dataset contains the following information:
| Column | Data |
|---|---|
| Age | Age of the individual |
| Sex | Sex of the individual (Male/Female) |
| Occupation | Occupation of the individual |
| sleepduration | Average length of sleep (hours) |
| sleepquality | Average sleep quality score |
| activitylevel | Value assigned based on average level of physical activities (minutes/day) |
| stress | Self-rated score of how stressed the individual feels |
| heartrate | Resting heartrate of individual (beats per minute) |
| steps | Average number of steps taken per day |
The task
The task for this worksheet is to determine if there are any correlations in the dataset and what they are.
You should plot any correlations you find on a graph, such as the one below:
A graph showing the correlation between sleep duration and resting heart rate, separated by sex.
There are lots of different variables in this dataset so do explore - what can you find out?
Extension task
Correlation doesn’t always equal causation - and Spurious correlations has many examples to prove this.
If you go to the website and click on a correlation, you should get a table with the raw data.
You can then: 1. Copy and paste this table into excel 2. Save your excel spreadsheet as a CSV file 3. Read the CSV in R 4. Plot and test the correlations
It may not be as straightforward as you’d think to copy the data from the table and get this into R.
You might find that R doesn’t like spaces in column or row titles - or that your columns and rows are the wrong way around.